Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Recognising Person Entities in Tweets

نویسندگان

  • Leon Derczynski
  • Kalina Bontcheva
چکیده

Recognising entities in social media text is difficult. NER on newswire text is conventionally cast as a sequence labeling problem. This makes implicit assumptions regarding its textual structure. Social media text is rich in disfluency and often has poor or noisy structure, and intuitively does not always satisfy these assumptions. We explore noise-tolerant methods for sequence labeling and apply discriminative post-editing to exceed state-of-the-art performance for person recognition in tweets, reaching an F1 of 84%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Extended Study of Content and Crowdsourcing-related Performance Factors in Named Entity Annotation

Hybrid annotation techniques have emerged as a promising approach to carry out named entity recognition on noisy microposts. In this paper, we identify a set of content and crowdsourcing-related features (number and type of entities in a post, average length and sentiment of tweets, composition of skipped tweets, average time spent to complete the tasks, and interaction with the user interface)...

متن کامل

Online adaptation strategies for statistical machine translation in post-editing scenarios

One of the most promising approaches to machine translation consists in formulating the problem by means of a pattern recognition approach. By doing so, there are some tasks in which online adaptation is needed in order to adapt the system to changing scenarios. In the present work, we perform an exhaustive comparison of four online learning algorithms when combined with two adaptation strategi...

متن کامل

Recognizing Named Entities in Tweets

The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect globa...

متن کامل

Understandability of Machine-translated Hindi Tweets Before and After Post-editing: Perspectives for a Recommender System

In the process of building a recommender system based on Hindi tweets for a project, we want to determine whether raw Machine Translation (MT) results could be useful. We collected 100K such tweets and experimented on 200 of them as a preliminary step. Less than 50% of the machine-translated tweets were understandable by English speakers, while at least 80% understandability seems to be require...

متن کامل

Learning Character Representations for Chinese Word Segmentation

We propose a simple yet effective semi-supervised method for improving Chinese Word Segmentation. Our method is based on learning generalizable vector and cluster representations of variable-length character sequences from large unlabeled data, which is then incorporated into a sequence labeling model with the passive-aggressive algorithm as features. We achieve state-of-the-art results on the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014